Using Noisy Speech to Study the Robustness of a Continuous F0 Modelling Method in HMM-based Speech Synthesis

نویسندگان

Kalu U. Ogbureke

João P. Cabral

Julie Carson-Berndsen

چکیده

In parametric text-to-speech synthesis using Hidden Markov Model (HMM), the fundamental frequency (F0) parameter modelling is important because it has a direct effect on the prosody of synthetic speech. F0 is typically modelled by a discrete distribution for unvoiced speech and a continuous distribution for voiced, by using a multi-space distribution (MSD). However, F0 modelling using MSD-HMM is not accurate around the voiced-unvoiced (V-UV) and (UV-V) transitions and it is affected by voicing decision errors of the F0 estimation algorithm. In order to reduce this problem, HMM-based speech synthesisers have been proposed that model F0 using continuous HMM. This approach usually obtains the continuous F0 contours by interpolating F0 in unvoiced regions. The problem with this method is that it is affected by voiced decision errors during speech analysis. For example, if voiced speech segments are incorrectly classified as unvoiced, the F0 contour in this region is obtained by interpolation which might be a poor estimate of the natural F0. This paper proposes to use an F0 estimation method that does not require a hard voiced/unvoiced classification and produces a reasonable smooth F0 contour. The robustness of this method was studied in the conditions of high-quality recorded speech and recorded speech with additive noise. The motivation for using noisy speech was to study the effect of voiced decision errors on the quality of the synthetic speech.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

From discontinuous to continuous F0 modelling in HMM-based speech synthesis

The accurate modelling of fundamental frequency, or F0, in HMM-based speech synthesis is a critical factor in achieving high quality speech. However, it is also difficult because F0 values are normally considered to depend on a binary voicing decision such that they are continuous in voiced regions and undefined in unvoiced regions. A widely used solution is to use a multi-space probability dis...

متن کامل

Speech enhancement based on hidden Markov model using sparse code shrinkage

This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...

متن کامل

Generation of Fundamental Frequency Contours of Mandarin in HMM-based Speech Synthesis using Generation Process Model

The HMM-based speech synthesis system can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. In this approach, short term spectra, fundamental frequency (F0) and duration are generated by multi-stream HMMs separately. However the quality of synthetic speech degrades when feature vectors used in training are noisy. Among all noisy features, pitch tr...

متن کامل

Discontinuous Observation HMM for Prosodic-Event-Based F0 Generation

This paper examines F0 modeling and generation techniques for spontaneous speech synthesis. In the previous study, we proposed a prosodic-unit HMM where the synthesis unit is defined as a segment between two prosodic events represented by a ToBI label framework. To take the advantage of the prosodicunit HMM, continuous F0 sequences must be modeled from discontinuous F0 data including unvoiced r...

متن کامل

Integration of Intonation in F0 Trajectory prediction using MSD-HMMs

Present study in speech synthesis places more and more emphasis on the spectral continuities and diverse prosodic effects. The trainable HMM-based speech synthesis method tends to generate more continuous spectral structures than the traditional unit selection method. However, the F0 trajectory generated by HMM-based speech synthesis is often excessively smoothed and lacks prosodic variance. Th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Using Noisy Speech to Study the Robustness of a Continuous F0 Modelling Method in HMM-based Speech Synthesis

نویسندگان

چکیده

منابع مشابه

From discontinuous to continuous F0 modelling in HMM-based speech synthesis

Speech enhancement based on hidden Markov model using sparse code shrinkage

Generation of Fundamental Frequency Contours of Mandarin in HMM-based Speech Synthesis using Generation Process Model

Discontinuous Observation HMM for Prosodic-Event-Based F0 Generation

Integration of Intonation in F0 Trajectory prediction using MSD-HMMs

عنوان ژورنال:

اشتراک گذاری